Breakdown of thermodynamic equilibrium for DNA hybridization in microarrays 



J. Hooyberghs, 1,2,3 M. Baicsi, 2 A. Ferrantini, 2 and E. Carlon 2 

1 Flemish Institute for Technological Research (VITO), Boeretang 200, B-2400 Mol, Belgium 
'^Institute for Theoretical Physics, KULeuven, Celestijnenlaan 200D, B-3001 Leuven 
,3 Hasselt University, Campus Diepenbeek, B-3590 Diepenbeek, Belgium 
(Dated: January 14, 2010) 

Test experiments of hybridization in DNA microarrays show systematic deviations from the equi- 
librium isotherms. We argue that these deviations are due to the presence of a partially hybridized 
long-lived state, which we include in a kinetic model. Experiments confirm the model predictions for 
the intensity vs. free energy behavior. The existence of slow relaxation phenomena has important 
consequences for the specificity of microarrays as devices for the detection of a target sequence from 
a complex mixture of nucleic acids. 
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DNA hybridization (the binding of two strands to form 
a double helix) in bulk solution has been extensively stud- 
ied in the past [l[ . In this letter we discuss hybridization 
in DNA microarrays. Microarrays are high throughput 
devices which have been widely used to measure the ac- 
tivity of genes at a genome wide level. In a microarray 
singled stranded DNAs are arrayed on a solid substrate in 
spots, each containing a specific sequence. Hybridization 
takes place between surface-bound sequences (referred to 
as probes) and sequences in solution (targets) carrying a 
fluorophorc. The amount of hybridized target is obtained 
from the emitted fluorescence from a given spot. Al- 
though hybridization in microarrays has attracted some 
interest in recent years its physical properties are still 
poorly understood (for reviews on the topic see, e.g., 0]). 
We demonstrate here that, contrary to a widespread be- 
lief, in DNA microarrays equilibration times may largely 
exceed typical experimental times. These claims arc 
based on experimental results and are corroborated by 
the analysis of a kinetic model. 
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FIG. 1: Plot of I/c as a function of AAG for four different 
experiments at concentrations c = 2, 10, 50 and 250 pM. The 
hybridization time is of 17 h. The "collapse" of the four data 
sets into a single curve demonstrates that I oc c in the whole 
intensity range. Deviations from the equilibrium isotherm 
observed at high intensities. Inset: plot of 
each individual concentration. 



TABLE I: Target and probe sequences used in the experi- 
ments. All sequences have a 5' to 3' orientation. Target se- 
quences have a 20-mer poly(A) stretch attached to their 3' 
end, which terminates with a Cy3 fluorophore. Probes have 
a 30-mer poly(A) stretch at their 3' end, which is covalently 
linked to the microarray surface. 

Target sequences in solution 

1 . CTTTGTCG AGCTGGTATTTGGAGAAC ACGT 

2. TCGAGCTGGTATTTGGAGAACACGT 

Probes at the microarray surface 

PM ACGTGTTCTCCAAATACCAGCTCGACAAAG 
ACGTGATCTCCAAATACCAGCTCGACAAAG 

1MM ACGTGCTCTCCAAATACCAGCTCGACAAAG 
ACGTGGTCTCCAAATACCAGCTCGACAAAG 

ACGTGATCTCCCAATACCAGCTCGACAAAG 



The experimental setup is shown in Table U and ex- 
tends that of Ref. 0). A single sequence is present in 
solution: cither a 30-mer or a 25-mer. The surface probe 
sequences are perfect matching, with one or two mis- 
matches. The mismatches can be of different nature and 
they are at different positions along the sequence [H . In 
total there are 1006 different probe sequences Cus- 
tom arrays containing spots with the probe sequences of 
Table U were purchased from Agilent Technologies. We 
used 15K slides which accommodate 15 replicas of the 
1006 sequences. The analysis is performed on the me- 
dian intensities over the replicas. The standard Agilent 
protocol (except for target fragmentation) and Agilent 
buffers [B| were used. The temperature is 65° C. 

Figure [T] shows a plot of I/c (the intensity divided by 
the target concentration) vs. AAG for four experiments 
at different concentrations using the setup of Table U (for 
the 30-mer target). The variable AAG = AG PM - AG 
is the difference in hybridization free energies between a 



10' 

l() 4 
Iio 5 
io 2 
Id' 



a) 



-7 



10' 

io 4 

I IO 3 

io 2 

LO 1 



t = 40min -, 



i Iio 3 



i i i i i i i i 

-5 -4 -3 -2 -1 



lb) 1 


1 1 1 1 1 >" 1 1 ! 








t = 3h24min T 


'■1,1 


, i , i , i , i , 



c) 


1 1 1 \/ 1 1 E 








9T • kinetic model = 


, i 


t=17h 1 

i,i,i,i,' 




-6 -5 -4 -3 -2 -1 -7 -6 -5 -4 -3 -2 -1 

AAG (kcal/mol) AAG (kcal/mol) 




-6 -5 -4 -3 -2 -1 

AAG (kcal/mol) 



6 -5 -4 -3 -2 

AAG (kcal/mol) 



FIG. 2: Plot of intensity vs. AAG four experiments at differ- 
ent times with a 30-mer target and c = 50 pM. Dashed lines 
have slope 1/RT, dotted lines have slope j/RT with 7 = 0.32. 
Solid lines are obtained from the solution of Eqs. ([2]) and © 
using as parameters fci = 10 5 Af _1 s _1 , = Is -1 , 7 = 0.32 
and AGpm = —14.5 kcal/mol. 



given sequence and the perfect match (PM) sequence. It 
is calculated from the nearest-neighbor parameters ob- 
tained from the analysis of microarray data, as discussed 
in Ref. Q (the nearest-neighbor model assumes that AG 
can be written as a sum of dinucleotide terms Fig- 
ure Q] shows that the intensity is proportional to the con- 
centration for four orders of magnitude in /. In the low 
c limit, equilibrium thermodynamics predicts that 



Ace 



-AG/RT 



(1) 



where A sets the intensity scale, R is the gas con- 
stant and T the temperature. Eq. ([1]) is obtained from 
the c —y limit of the Langmuir isotherm (which was 
used in microarray data analysis jil-Hoj). but also from 
isotherms in which electrostatic effects are taken into ac- 
count (electrostatic effects were experimentally ob- 
served at low ionic strengths 12| ) . In the latter AG con- 
tains a contribution from electrostatic interactions. For 
both isotherms, in the I oc c regime one expects a linear 
dependence of log / on AG (or AAG). Fig. [T]shows that 
the experimental data are only in partial agreement with 
Eq. ([T]), which is drawn as a dashed line in the figure. 

Wc then extended the analysis at different hybridiza- 
tion times. Figure [5] shows a plot of / vs. AAG for a 
30-mer target at four different times and for a concen- 
tration of 50 pM (the 17 h hybridization data are those 
already shown in Fig.[T]). Once the desired hybridization 
time has been reached the experiment is stopped, the 
microarray washed and scanned to measure the emitted 
fluorescence from every spot. Experiments at different 
hybridization times thus require different slides. As the 
hybridization time increases, a larger fraction of the data 
aligns along a line with a slope 1/RT, which shows that 



FIG. 3: As in Fig. [2] for a target sequence of length 25. The 
target concentration is c = 500pM. 



the observed deviations from Eq. ([1} are due to the break- 
down of thermodynamic equilibrium. Surprisingly, full 
equilibrium has not been reached here even after 86h. In 
Fig. [Ub,c,d) the data for log/ align along two slopes: 
in the equilibrium regime the slope is 1/RT (Eq. ([T])). 
in the non-equilibrium regime the slope is smaller and 
appears to be constant in the course of time [13| . Hy- 
bridization data for the shorter target sequence (25-mer) 
are shown in Fig. [3] The agreement with Eq. ([T]) is over 
three orders of magnitude in the intensity scale at times 
> 3h. Hence equilibration is much faster for the shorter 
sequences. The experimental setup allows a detection of 
non-equilibrium effects as deviation from the 1/RT line 
without the need, in principle, of a time series analysis. 

Hybridization of oligonucleotides in solution is usually 
described as a two state process. However, as will be 
shown, a two-state process cannot be reconciled with the 
experimental data shown in Fig. [2j During manufactur- 
ing the probes are tethered to the surface and can form 
a dense layer that slows down hybridization. Some of 
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these effects have been discussed in Refs. 14, |15J. The 
typical distance between probes is 10 nm, and the length 
of a fully stretched 30-mer duplex is 10 nm and its thick- 
ness of 2 nm. Probe sequences in the experiment have 
also a poly (A) 30-mer spacer (see Table Uj). Therefore a 
single target molecule can interact with more than one 
probe. Taking this into account, we have extended the 
two state hybridization model with an additional inter- 
mediate state (Fig. [4j. Indicating with d\ and 9 2 , the 
fraction of partially and fully hybridized probes on a mi- 
croarray spot, the kinetics of these reactions is given by 



dgi 
dt 
d02 
dt 



cfci(i - e 1 - e 2 ) + k_ 2 e 2 - (fc_ x + k 2 )e 1 (2) 

fc 2 0i - k„ 2 2 (3) 
where c is the target concentration in solution and fci, 
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FIG. 4: The three state model for hybridization in DNA mi- 
croarrays is specified by the four rate constants. 



k-i, k 2 and k^ 2 the four rates involved (see Fig. @|. For 
simplicity we have assumed that at most a single target 
molecule can bind to a given probe. 9\ is the average 
occupation fraction over several configurations in which 
partial binding can occur at different positions of the 
probe sequence. 

The rate constants, using a two state model descrip- 
tion, have been measured in several microarray experi- 
ments. In Rcf. 



16j the hybridization of a common tar- 



get sequence to a perfect match probe and to a probe 
containing one mismatch were considered. The following 
rates were measured (at 45° C): k[ 



(PM) 



(MM) 



io 4 Ar 



k 



(PM) 
1 



12 



and 



.(MM) 



21 

— 29 • 10 s -1 , While there is more than a fac- 
tor two of difference in the detachment rates, the at- 
tachment rates differ only by 10%. These results are in 
agreement with observation for kinetic behavior in bulk 
solution 18(. The probes in our experiment differ by at 
most two nucleotides out of 30. We will then take k\ as 
sequence independent. Consider now the reaction rate 
k 2 . In the partially hybridized state the target strand 
binds over a stretch of nucleotides with one probe se- 
quence (primary contact) but it can also bind to a second 
neighboring probe (secondary contact). The rate limit- 
ing step is the unbinding from secondary contacts and 
the strand contraction so that the target probe can over- 
come steric hindrance and wind up into a fully formed 
helix all along its length, k 2 will depend on target length 
and probe length and density. We will assume k 2 to be 
the same for the probes of Table HJ The reverse rates 
are then fixed by the thermodynamics relations 

k-x = k ie AG '? RT , fc_ 2 = k 2 e (AG - AG '^ RT (4) 

where AG" and AG are the free energy differences be- 
tween configurations 1 and 2, and the unbound state, 
respectively. Next we link AG' to AG. Weak total bind- 
ing (small |AG|) caused by the presence of multiple mis- 
matches should also correspond to weak partial binding 
(small | AG' |). As a simple approximation we will assume 
that the two free energies are monotonically linked as 



The model is thus characterized by ki, k 2 and 7. 

To gain some more insight we consider the limit of fast 
equilibration for Eq. © . First we obtain the equilibrium 
value for 9\ by setting the right hand sides of Eqs. © 
and © to zero in the limit ce~ AG/RT < 1. We then 
solve Eq. ([3]) replacing for 9\ its equilibrium value 9± = 
ce~ AG l RT . Setting the initial condition 6*2 (0) = we get 



*(*) 



ce 



-AG/RT 



0-'-" ) 



(6) 



T -l = k _ 2 = k2e (AG-AG')/RT = fc2e (l- 7 )AG/flT (?) 

The relaxation time, r, depends on AG: weakly bounded 
sequences (small |AG|) equilibrate faster than strongly 
bounded ones (large |AG|). For fast equilibrating se- 
quences (t <C i) one recovers Eq. ([]} from Eq. ([6]); for 
sequences with long equilibration times r>f we expand 
Eq. © to lowest order in t/r. With this approximation 
we find that for a given time t 



f ce~ AG / RT \AG\ « I AG* 
21 ' \ ctk 2 e^ AG ' RT |AG| » I AG* 



(8) 



AG' « 7AG 



( 7 <1). 



(5) 



where AG* is a crossover free energy that depends on 
time and is obtained by setting t = t in Eq. (|7|). After 
hybridization the slides undergo washing steps according 
to the standard Agilent protocol, which are expected to 
remove weakly bound target molecules from the slide and 
have been also included in thermodynamics models of 
arrays fl9j . In the present setup there is only one target 
sequence in solution and washing is likely to affect the 
partial hybridized state. In this case one can assume 
that the measured intensity is given hy I ~ A9 2 (in the 
model the typical free energies of the partially hybridized 
states are such that 61 <C 9 2 ). Equation ((SJ reproduces 
the two slopes in the log / vs. AAG plots as seen in the 
experiments (Fig. [IJ. It shows that the non-equilibrium 
regime is characterized by a slope equal to j/RT. 

The solid lines in Fig. [^b,c,d) are plots of the inten- 
sity / = A9 2 obtained from the solution of Eqs. ([2]) and 
([3]). The parameters used are given in the caption of 
Fig. [3J For the choice of parameters given, the fast equi- 
libration limit (Eq.®) approximates very well the full 
solution of Eqs. © and ([3]). There is a reasonable agree- 
ment between the experiments and the kinetic model. We 
note though that the crossover between the two regimes 
is somewhat sharper in experiments. In addition, the 
experimental / vs. AAG data show a slight sigmoidal 
trend which is not present in the kinetic model. Within 
the two state model kinetics, and using the assumption 
k\ as sequence independent, one arrives to a solution sim- 
ilar to Eq. (jSJ) although with 7 = 0. Therefore the two 
state model cannot account for a second finite slope as 
observed in the experiments. Note that the fit of the 
kinetic model to the data requires also an estimate of 
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AGpm, as the method of Ref. [1] does not provide abso- 
lute AG for a sequence but only AAG, i.e. differences in 
free energies with respect to a perfect match hybridiza- 
tion. In addition, in the fit we adjusted the constant A 
(I = AO 2) as we note in the experimental data a global 
decrease of the intensity scale. This is probably due to 
some degradation of the fluorophores or of the target and 
probe strands (as depurination, the loss of purines, which 
alters the binding energies of the involved strands). The 
overall decresc in intensity occurs both for the 25-mer and 
the 30-mer sequences. In the latter the effect is somewhat 
stronger, especially at longer hybridization times. 

Eq. ([7]) predicts that the relaxation time is a function 
of AG, AG' and k 2 ■ We can use this equation to compare 
the ratio between the times for a L = 30 and L = 25 tar- 
gets. Consider the same probe sequence hybridizing to 
the two targets. Assuming AG'(L = 30) w AG'(L = 25) 
and using as estimate of the difference in binding en- 
ergies between AG(L = 30) - AG(L = 25) « -2.5 
kcal/mol (T3], we get from Eq. a decrease of a factor 
exp(2.5/ RT) ss 40 in the relaxation time. In addition one 
also expects k 2 (L = 25) > k- 2 (L = 30) which decreases 
the relaxation time even further. The similarities be- 
tween Fig. [2(c) and Fig. [3(a) suggest that the relaxation 
time ratio between 25-mers and 30-mers is approximately 
25, which is the same order of magnitude just obtained 
from Eq. ([?])■ Since typical biological experiments in- 
volve target strands of lengths 30-50, the breakdown of 
equilibrium shown here may occur in many different mi- 
croarrays platforms and in biological experiments, and 
could involve even longer relaxation times. 

Summarizing: We showed that hybridization in DNA 
microarrays under standard conditions is characterized 
by relaxation times which may largely exceed the ex- 
perimental time. In the non equilibrium regime the 
intensities are distributed as e -^ AG / RT ^ with 7 < 1. 
This is equivalent to introducing an effective tempera- 
ture T c ff = T/7 > T. Interestingly effective temperatures 
were used as adjustable phenomenological parameters to 
fit biological microarray data 0, Q . This work provides 
an insight on the origin of these. 

The breakdown of equilibrium implies lower specificity 
of the microarrays as devices for the detection of a de- 
sired sequence from a complex mixture. To see this con- 
sider a probe at the microarray surface and two sequences 
at equal concentration in solution: one perfect matching 
with the probe and one with a mismatch. In the equilib- 
rium regime the two sequences hybridize to the probe 
with a probability ratio e (&G PM -AG U M)/RT w .05, 
where we have used a typical value AG mm — AGpm ~ 
2 kcal/mol Q and a temperature of T = 65° C. In the 
nonequilibrium regime, due to the presence of a factor 
7 < 1 in the exponential the ratio is about 0.4 (tak- 
ing 7 = 0.32). Therefore in the non-equilibrium regime 
a significant fraction of a measured signal may be due 
to hybridization to non-complementary targets, a phe- 



nomenon known as cross-hybridization. For an optimal 
functioning of the microarrays it is then desirable to work 
under equilibrium conditions Q . Several parameters may 
influence the relaxation time as temperature, salt and 
buffer conditions. The experimental setup discussed in 
this paper provides a good test of equilibrium (single line 
vs. broken line in a / vs. AAG plot) and can be used to 
investigate the best working conditions for hybridization. 
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